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INTRODUCTION .  This  article  considers  the  problem  of  determining 
the  optimal  value  and  corresponding  optimal  point  of  a  real 
function  F  in  m  variables.  Only  function  values  are  given  and 
the  computation  of  derivatives  is  either  not  practical  or  are 
not  available.  Many  techniques  were  developed  in  the  last  few 
years  which  do  not  use  derivatives  [2  ]  [4  ]  [14]  [15],  however 
some  turn  out  to  be  of  little  practical  use  when  applied  to 
problems  in  many  variables  and  nonlinear  in  nature.  The  main 
difficulty  is  slow  convergence  and  early  termination  of  the 
algorithm.  In  contrast,  methods  which  use  derivatives  might 
converge  faster  but  require  instead  an  immense  amount  of  calcula¬ 
tions  of  large  inverse  matrices  which  have  first  or  second 
partial  derivatives.  This  task  is  formidable  and  in  many  cases 
impossible  to  attain  [6  ]  [8  ].  Bremermann  (1970)  introduced 
an  ingeneous  and  useful  optimization  algorithm  that  is  guaranteed 
to  converge  for  polynomials  in  several  variables  up  to  fourth 
degree.  The  heart  of  this  method  is  the  use  of  random  directions 
of  search  together  with  a  Lagrangian  interpolation  scheme. 

This  author,  having  had  extensive  experience  with  this  algorithm, 
found  that  the  method  has  fast  convergence  at  the  early  stages 
and  tends  to  stagnate  in  the  neighborhood  of  the  optimal  point 
[1  ]  [9  ]  [12]. 

Motivated  by  the  usefulness  of  random  directions  it  is  the 
purpose  of  this  article  to  present  an  algorithm  based  on  the 
proper  use  of  interpolation  schemes;  (a)  Lagrangian  interpolations 
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(such  as  those  in  Bremermann's  methods);  (b)  spline  approximations 
with  variable  nodes;  (c)  pseudo  Newton  steps  using  the  spline 
derivatives  (not  the  function)  ;  together  with  a  search  procedure 
along  weighted  random  directions.  The  directions  are  chosen 
to  be  orthogonal  using  the  Gram  Schmidt  orthogonalization  pror 
cedure.  This  algorithm  was  extensively  used  for  problem  solving 
in  mathematical  biology/  chemical  kinetics,  and  general  dynamical 
systems. 

Perfect  line  searches  along  the  weighted  random  directions 
is  not  assumed,  however,  we  require  monotonicity  of  the  objective 
function  at  each  successful  iteration,  in  addition  close  estimates  of 
the  optimal  point  are  not  required.  The  method  presented  has 
the  virtue  of  being  able  to  solve  nonlinear  systems  in  many 
variables  which  are  very  commonly  encountered  in  the  area  of 
applied  mathematics. 

THE  PROBLEM. 

We  want  to  solve  the  following  problem: 

min{F(x)  j  x  6  3R^}  . 

We  assume  any  initial  estimate  x^  6  3Rn,  and  for  any  value  x  £  3Rn 
the  functional  value  of  F  is  well  defined. 

DESCRIPTION  OF  THE  METHOD. 

(1)  A  set  of  linearly  independent  vectors  is  formed,  one  of 
which  is  a  random  vector  R  »(r^,...,rn) .  That  is,  we  take 
the  standard  basis 

ek  *  (0, . . . ,1,0,. . .  0)  (1  at  the  k  place). 

We  look  for  the  entry  of  R  having  the  highest  value,  say  it 
appears  in  the  j  position.  Now  we  fop0,the  set  o£.,ve.ctors 

AIR  F0-\  H  ■’  ■  ■  • 

NCTIC?  '  ■'  * 

This  t 

>■  .■  - 

Distvt  .  ; 

MATTHEW  J .  .. 

Chief,  Technical  Infonaation  DiTieion 


-  3  - 


ek  *  (0, . . .  1/  0. . .  0)  k  f  j 


=  (r. 


r_)  • 
IQ 


k  =  j 


R  is  a  random  vector  whose  components  (r,...rn)  are  independr- 
ently  generated  from  a  Gaussian  distribution. 

(2)  We  use  the  Gram-Schmidt  orthogonalization  procedure  to 
obtain  a  set  of  orthogonal  vectors  v^,...  v^  ,  such  that 
<  v^.vW  >  =  0,  for  i  ?  j. 

(3)  A  matrix  D  is  constructed  such  that 


(xj)-” 


lx")-* 


If  x^  <  0  set  x^ 


1,  for  any  i. 


The  diagonal  elements  of  D  are  precisely  the  elements  of 

„(°)  =  ,x«»  „(0K 

x  x  ^  ,  .  .  .  ,  x  n  j  . 

(4)  Define  the  random  vectors 

w^>  -  [P_5*J  ^v*1*  i  -  0,  1,...,  n 

(5)  Choose  the  random  directions  as  the  first  directiQn 


of  search. 


5  . 


(6)  A  set  of  functional  values  {Fj}j=1  is  obtained  by  evaluating 

F  at  equidistant  collinear  points  y^  =  x^+2w^  ?y2«x^^+w^*  j 

y3  «  x<°>  ;y4  -  x(0)-w(1);  and  y5  =  x{0)-2w(1). 

2 

(7)  The  cubic  spline  S(X)  €  C  [y^,y^+1],  1  <  i  <  4,  approximates 
the  function  F  restricted  to  the  line  x^  +  Xw^. 

(8)  The  roots  of  S’ (X)  *0  are  found.  There  are  from  8  to  1 
possible  real  roots ,  \a,  l  »  !#...«  8. 
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(9)  Determine  the  minimum  functional  value  among 

F[x^+XjW^  ] ,  l  =  1,...,  8.  Denote  the  vector  which  corres¬ 
ponds  to  this  minimum  by  =  x^  +  X  w^  •  This  vector  is 

the  new  approximation  to  the  minimum. 

(10)  If  F[x^  ]  >  F[x^  3  we  generate  a  new  approximation  to 
F  by  using  a  four  degree  Lagrangian  interpolation  such  as  the 
one  used  in  Bremermann's  scheme. 

(11)  For  any  successful  step,  i.e.  F[x^  )  <  F[x 
perform  a  pseudo  Newton  random  step 

x(n)  „  x(n-l)  _  F[x(n~1) 1 

S»U) 

m 

(12)  Terminate  the  search  if 

(1)  F  <  and  II  x^-x^"1*  II  <  t2 
elfc2  are  pre-assigned  tolerances 

or 

(2)  A  predetermined  number  of  iterations  have  been 
executed. 

ANALYSIS  OF  THE  ALGORITHM. 

One  of  the  main  features  of  this  method  is  the  nature  of  the 
random  direction.  In  what  follows  we  will  analyze  the  search 
directions  and  their  properties. 

DEFINITION  Two  vectors  w^  and  w^  ,  i  /  j,  are  said  to  be 
duals  with  respect  to  the  product  (P^*  P*^)  of  two  diagonal 
matrices  P^  and  P^  having  positive  entries  if 

w(i)T(P(j)  P(i))w(j)  -  0  i  ?  j  i  »  1,...,  n 

j  *  1 f • • • 9 


DEFINITION  The  set  of  vectors  (w'  , ,  w'  ' )  are  mutually 

duals  if  for  any  pair  (w^,w^)  we  can  find  a  product, 

(p(j)  of  positive  diagonal  matrices  such  that 

w(i)T(p(j)  p(i))w(j)  .  o. 

There  are  many  ways  to  generate  dual  vectors.  We  form 
the  dual  vector  by  the  following  procedure. 

(I)  Choose  a  nonzero  vector  v^  €  3Rn. 

(ii)  Find  n-linearly  independent  orthogonal  vectors  using  the 
Gram-Schmidt  orthogonalization  procedure,  i.e.  if 
8  s  {T(l)|iM  v(n)}  then  <  v^*  ,v^*  >  *  0  i  ?  j, 
where  <  ,  >  stands  for  the  scalar  product. 

(Ill)  Set  w<i}  =  p<1)-JJv(i)  i  =  1,...,  n  where  P(l)  is  a  n*n 
positive  diagonal  matrix,  and  v^  6  fl. 

PROPOSITION  1  The  set  of  vectors  w^1*  ...  w*n^  defined  by 
w(i)  *  P^  Vi}  form  a  set  of  mutually  duals. 

PROOF  Let  w<«  ?  w^  .  Then  for  any  i  and  any  j 

w(i)T(p(j)  p<i))Vj)  =  „(!>%<«*  pCj)**  w(j) 

•  -  (p(i,'J*  v(i))T  P(i)H  v<i)) 

.  v(i)T  pd)"1*  p (i) **  ptj)*5  pij)*5*  v(j) 

.  v(i)T  VCJ> 

-  ■  0. 

THE  QUADRATIC  CASE 

If  F (x)  is  a  quadratic  function: 

F(x)  -  a  +  bTx  +#xTQx. 

x,b  €  Ha,  Q  is  positive  definite.  Then  minimizing  F  using 


line  searches  in  the  direction  of  mutually  dual  vectors  will  give 
us  an  indication  of  the  improvement  from  one  iteration  to  another 
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Let  *  P^v(i^  .  Then  the  sequence  of  iterates  generated 


by  the  algorithm  is 

x(n+l)  .  x(n)  +  x(n)w(n) 

or 


n  *  0,1,2, . . . 


,(n+l) 


.(0) 


+  l  X(n)w(n)  n  *  0,1,2,... 


i=0 


(n) 


where  X  satisfy  the  inequality 

F[x(l+1)]  <  F[x(l)]  i  =  0,1,2  ...  n  . 

The  restriction  of  F  on  the  line  x^+X^w^  is  given  by 
F[x(k)+X(k)w(k>  ]  =*Ftx(k)]  +  2X(k'w(k)  [Qx(k)+b]  +  (X(k)  )2[w(k)  ]TQw 


and  the  minimum  is  given  by 


3p 


3X 


WT 


0 .  That  is 


23X 'k' 
which  imply  that 


13F  =  w(k)lQx(k)+b]  +  Xtk>(w<k>)TQw<k> 


.00  e  _  w(k)  [Qx(k)+b] 


(w'k))TQw(k)  * 

The  miniimim  F  value  in  the  direction  is  given  by 


F 


r ... (k)  i  2w{k)  tQx(lc)+b]  __(k)  [Qx(k)  +b]  + 

PU  1  rw‘k>?ow(kr 


&&&■) 


F  -  Ftx<k)] 


F. 


min 


Ftx(k)]  -  QWW) 


<k),Jfr„<W,T  „„(k> 


(k> 


(k)..(k) 


MEASURE  OF  IMPROVEMENT 

The  ratio  F[x^+1^  ]  /  Flx^M  is  defined  as  the  improvement 

(k) 

factor  for  direction  w  ,  [  1  ] .  This  ratio  gives  an  estimate 
of  rate  of  convergence  of  the  algorithm. 

Denote  by  0^  the  improvement  factor  for  the  direction, 

F[x(i)  ] 

i.e.  6.  =  - rj — 5-t — .  Then  a  measure  of  the  rate  of  improvement 

1  F[xU  ] 

*  pfx(k)  1 

after  k  iterations  is  given  by  0.  =  — - — rr  ■■  ■_  . 

*  F[xlk~1T] 

PROPOSITION  2.  The  minimum  of  F  is  equal  to 


F  . _  =  Ftx'u' ]  n  9. . 


The  ratio  F  ^  n/F[x  3  is  a  measure  of  the  maximum  improvement 


factor . 


PROPOSITION  2.  If  F[x]  >0  Vx  6  3R  then 


Log  F 


nan 


k 


l 

i«l 


Log  0^ 

Log  F[x(0) ] 


THE  INTERPOLATION  STEP 

We  use  two  interpolation  schemes :  (A)  Five  points  Lagrangian 

interpolation,  (B)  Cubic  spline  interpolation  with  variable 
inodes.  The  Lagrangian  step  is  extremely  efficient  in  the  search 
for  the  minimum  in  the  early  stages  while  the  spline  converges 
faster  when  in  the  proximity  of  the  minimum. 

Let  h  =  max  II  x^+1^-x^  II  be  the  maximum  step  size  between 
the  interpolated  points  and  denote  by  Etx]  ■  F(x)-S(x)  the  error 
between  the  cubic  spline  approximation  and  F(x).  S(x)  is  the 
cubic  spline  .  The  Etx]  and  its  derivative  E'[x]  are  bounded  by 

b  ^ 

I Etx] I  <  h3/2  (  |  tF"(t)2dt])  VX  €  [a,b] 

a 

([a,b]  is  the  interval  of  approximation) 
and 

b  h 

lE'tx]  <  h1*  (  |  tF"(t)]2dt)  Vx  6  [a,b] . 
a 

Thus  Etx]  and  E'tx]  are  bounded  by  an  expression  proportional 
to  h3/^2  and  h**  respectively  t  7  3 . 

In  the  iteration  the  choice  of  h  in  our  method  is 
given  by 

h  ■  min(l  ,  \/  F[xw  ]) 

Therefore  for  F  <  1  the  inte^al  of  the  cubic  approximation  over 


n  points  is  of  length  (n-1)  \/  Fix'  '  ] .  If  n  increases  and 
h  -»  0  then  S(x)  and  S'  (x)  converge  uniformly  to  F(x)  and  F' (x) 
respectively  [  7  ] . 


When  S' [a]  approximates  F'ta]  for  a  6  [a,b],  we  can  calculate 
S' (x)  and  apply  a  Newton-  Raphson  step  using  S'(x)  instead  of 
F' (x)  (which  is  unknown) .  That  is 


X<n+D 


(n) 


F[x(n) ] 
S' [x(n) j 


for  n  =»  1 , . . . ,  s 


The  errors  for  the  Lagrangian  interpolation  are  extensively 
analyzed  in  [  7 ]  [  1 ] . 


RESULTS . 

We  have  solved  standard  test  problems  with  our  algorithm. 
These  functions  were 

(1)  Rosenbroock;  Parabolic  Valley  [ 8  1 
f(x)  =  100(x2-x^)2  +  (1  -  x,)2 

(2)  Powell's  singular  of  4  variables  [14] 

f (x)  *  (Xj+IO.Xj)2  +  5(x3-x4)2  +  (x2-2x3)4  +  10(Xj-x4)4 

(3)  Rosenbroock  Parabolic  Valley  -  50  Dimensional  [13] 


F[x]  » 


i=»l'  l 


(i)  2 

(  L)  X(i+D^2 


29 


♦  i  (x*1*  -  0.5  x<i+1>)2 
i«25 


+  3  Y  (x<»  -  <‘1)  x 

i“30  '  • 

+  )  4 


(i) 


♦  2  Y  (x'i>-x(i+l>’)2 

i=20  ' 

^\2.22l  (x^-x'51’1)4 

/  3  si 


(4)  Zangwill  (1967) 

**»-•  * 

Fix]  =  (^-)  I16x(2)+16x(2)-8x1x2-56x1-256x2+991] 


In  addition  to  these  functions  the  method  was  tested  on 
nonlinear  systems  defined  by  ordinary  differential  equations, 
among  them;  a  model  of  gluconeogenesis  described  by  twenty 
differential  equations  having  31  unknown  parameters  [12],  an 
inducible  system  model  having  14  differential  equations  having 
12  unknown  parameters  HO  ] ,  and  an  ecological  model  with  8 
unknown  parameters  [11] .  In  all  of  the  dynamical  systems  the 
optimization  was  performed  to  find  least  square  minimum  of  a 
function  with  noise  in  the  data  [11].  The  following  table 
gives  a  summary  of  the  numerical  results*. 


Function 

#  of 

Variables 

#  of  Function 
Evaluations** 

#  of  Spline 
Iteration 

(1) 

Rosenbroock 

2 

1310 

25 

(2) 

Zangwill 

2 

64 

5 

(3) 

Powell  Singular 

4 

302 

7 

(4) 

Rosenbroock 

50 

571 

6 

(5) 

Gluconeogenesis  Model 

31 

1176 

39 

1X6) 

Gluconeogenesis  Model 

12 

885 

28 

(7) 

Inducible  System 

12 

1300 

40 

(8) 

Ecological  Model 

8 

370 

16 

*  The  computations  were  done  on  a  PDP-11  computer  and  the  termina- 
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tion  criteria  for  the  functions  1-4 ,  was  e  =  10  ,  and  for  systems 

5-8  the  minimum  of  F  was:  (#  of  noisy  data  points) x (Relative  value 
of  data  point  +  assumed  level  of  noise) . 

**  Includes  all  function  evaluations  of  Fy  spline  interpolation, 
Lagrangian,  and  pseudo  Newton. 

tThis  is  a  linear  model  described  by  6  differential  equations  having  12  unknown 
parameters. 


CONCLUSIONS 


In  this  article  we  have  presented  a  method  for  unconstrained 
optimization  which  is  based  on  interpolation  schemes  and  random 
weighted  direction.  The  algorithm  is  ‘the  result  of  the  author's 
extensive  experience  with  Bremermann's  methods  based  on  Lagrangian 
interpolation.  No  one  method  can  be  optimum  in  the  sense  of  being 
best  for  all  functions  (or  even  for  a  given  function) ,  moreover, 
it  is  unlikely  to  find  a  method  which  works  well  in  all  regions, 
far  from  the  minimum  as  well  as  near.  In  this  article  we  con¬ 
fronted  this  problem  precisely  by  using  the  Lagrangian  while  far 
away  from  the  minimum  and  the  spline  while  in  its  vicinity.  The 
direction  of  search  is  always  weighted  by  the  best  present  vector 
value,  and  because  of  the  randomness  the  algorithm  is  insensitive 
to  the  change  in  the  local  geometry  of  the  problem.  The  method 
is  well  suited  to  optimize  functions  for  which  their  derivatives 
are  unknown  and  cannot  be  computed  accurately.  From  the  numerical 
results  we  can  see  the  effectiveness  of  the  method  on  problem  with 
many  variables,  and  it  is  clear  that  the  higher  the  dimension  the 
more  effective  is  the  algorithm. 

Additional  applications  of  this  algorithm  in  the  field  of 
nonlinear  chemical  kinetics,  and  in  determining  multiple  equilibrium 
states  of  dynamical  systems  will  be  reported  soon. 
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